Dataset statistics
| Number of variables | 28 |
|---|---|
| Number of observations | 5043 |
| Missing cells | 2407 |
| Missing cells (%) | 1.7% |
| Duplicate rows | 45 |
| Duplicate rows (%) | 0.9% |
| Total size in memory | 1.1 MiB |
| Average record size in memory | 224.0 B |
Variable types
| Categorical | 3 |
|---|---|
| Text | 9 |
| Numeric | 16 |
| Dataset has 45 (0.9%) duplicate rows | Duplicates |
color is highly imbalanced (75.0%) | Imbalance |
language is highly imbalanced (88.8%) | Imbalance |
content_rating is highly imbalanced (50.4%) | Imbalance |
director_name has 104 (2.1%) missing values | Missing |
director_fb_likes has 104 (2.1%) missing values | Missing |
gross has 677 (13.4%) missing values | Missing |
plot_keywords has 153 (3.0%) missing values | Missing |
content_rating has 303 (6.0%) missing values | Missing |
budget has 406 (8.1%) missing values | Missing |
title_year has 108 (2.1%) missing values | Missing |
aspect_ratio has 329 (6.5%) missing values | Missing |
budget is highly skewed (γ1 = 48.57751667) | Skewed |
director_fb_likes has 907 (18.0%) zeros | Zeros |
actor_3_fb_likes has 89 (1.8%) zeros | Zeros |
facenumber_in_poster has 2152 (42.7%) zeros | Zeros |
actor_2_fb_likes has 55 (1.1%) zeros | Zeros |
movie_fb_likes has 2181 (43.2%) zeros | Zeros |
Reproduction
| Analysis started | 2024-04-11 08:25:56.538880 |
|---|---|
| Analysis finished | 2024-04-11 08:26:42.407698 |
| Duration | 45.87 seconds |
| Software version | ydata-profiling vv4.7.0 |
| Download configuration | config.json |
color
Categorical
IMBALANCE 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 19 |
| Missing (%) | 0.4% |
| Memory size | 39.5 KiB |
| Color | |
|---|---|
| Black and White | 209 |
Length
| Max length | 16 |
|---|---|
| Median length | 5 |
| Mean length | 5.4576035 |
| Min length | 5 |
Characters and Unicode
| Total characters | 27419 |
|---|---|
| Distinct characters | 16 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Color |
|---|---|
| 2nd row | Color |
| 3rd row | Color |
| 4th row | Color |
| 5th row | Color |
Common Values
| Value | Count | Frequency (%) |
| Color | 4815 | |
| Black and White | 209 | 4.1% |
| (Missing) | 19 | 0.4% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| color | 4815 | |
| black | 209 | 3.8% |
| and | 209 | 3.8% |
| white | 209 | 3.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| o | 9630 | |
| l | 5024 | |
| C | 4815 | |
| r | 4815 | |
| 627 | 2.3% | |
| a | 418 | 1.5% |
| B | 209 | 0.8% |
| c | 209 | 0.8% |
| k | 209 | 0.8% |
| n | 209 | 0.8% |
| Other values (6) | 1254 | 4.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 27419 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| o | 9630 | |
| l | 5024 | |
| C | 4815 | |
| r | 4815 | |
| 627 | 2.3% | |
| a | 418 | 1.5% |
| B | 209 | 0.8% |
| c | 209 | 0.8% |
| k | 209 | 0.8% |
| n | 209 | 0.8% |
| Other values (6) | 1254 | 4.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 27419 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| o | 9630 | |
| l | 5024 | |
| C | 4815 | |
| r | 4815 | |
| 627 | 2.3% | |
| a | 418 | 1.5% |
| B | 209 | 0.8% |
| c | 209 | 0.8% |
| k | 209 | 0.8% |
| n | 209 | 0.8% |
| Other values (6) | 1254 | 4.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 27419 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| o | 9630 | |
| l | 5024 | |
| C | 4815 | |
| r | 4815 | |
| 627 | 2.3% | |
| a | 418 | 1.5% |
| B | 209 | 0.8% |
| c | 209 | 0.8% |
| k | 209 | 0.8% |
| n | 209 | 0.8% |
| Other values (6) | 1254 | 4.6% |
director_name
Text
MISSING 
| Distinct | 2398 |
|---|---|
| Distinct (%) | 48.6% |
| Missing | 104 |
| Missing (%) | 2.1% |
| Memory size | 39.5 KiB |
Length
| Max length | 32 |
|---|---|
| Median length | 24 |
| Mean length | 13.084835 |
| Min length | 3 |
Characters and Unicode
| Total characters | 64626 |
|---|---|
| Distinct characters | 76 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1504 ? |
|---|---|
| Unique (%) | 30.5% |
Sample
| 1st row | James Cameron |
|---|---|
| 2nd row | Gore Verbinski |
| 3rd row | Sam Mendes |
| 4th row | Christopher Nolan |
| 5th row | Doug Walker |
| Value | Count | Frequency (%) |
| john | 180 | 1.8% |
| david | 150 | 1.5% |
| michael | 127 | 1.2% |
| james | 87 | 0.8% |
| peter | 85 | 0.8% |
| robert | 84 | 0.8% |
| paul | 81 | 0.8% |
| richard | 80 | 0.8% |
| scott | 65 | 0.6% |
| lee | 58 | 0.6% |
| Other values (2966) | 9277 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 6097 | 9.4% |
| 5335 | 8.3% | |
| a | 5278 | 8.2% |
| n | 4658 | 7.2% |
| r | 4447 | 6.9% |
| o | 3794 | 5.9% |
| i | 3693 | 5.7% |
| l | 2970 | 4.6% |
| t | 2321 | 3.6% |
| s | 2089 | 3.2% |
| Other values (66) | 23944 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 64626 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 6097 | 9.4% |
| 5335 | 8.3% | |
| a | 5278 | 8.2% |
| n | 4658 | 7.2% |
| r | 4447 | 6.9% |
| o | 3794 | 5.9% |
| i | 3693 | 5.7% |
| l | 2970 | 4.6% |
| t | 2321 | 3.6% |
| s | 2089 | 3.2% |
| Other values (66) | 23944 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 64626 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 6097 | 9.4% |
| 5335 | 8.3% | |
| a | 5278 | 8.2% |
| n | 4658 | 7.2% |
| r | 4447 | 6.9% |
| o | 3794 | 5.9% |
| i | 3693 | 5.7% |
| l | 2970 | 4.6% |
| t | 2321 | 3.6% |
| s | 2089 | 3.2% |
| Other values (66) | 23944 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 64626 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 6097 | 9.4% |
| 5335 | 8.3% | |
| a | 5278 | 8.2% |
| n | 4658 | 7.2% |
| r | 4447 | 6.9% |
| o | 3794 | 5.9% |
| i | 3693 | 5.7% |
| l | 2970 | 4.6% |
| t | 2321 | 3.6% |
| s | 2089 | 3.2% |
| Other values (66) | 23944 |
num_critic_for_reviews
Real number (ℝ)
| Distinct | 528 |
|---|---|
| Distinct (%) | 10.6% |
| Missing | 50 |
| Missing (%) | 1.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 140.19427 |
| Minimum | 1 |
|---|---|
| Maximum | 813 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 39.5 KiB |